Major Project: Visualizations of African Women

PSDV-4200
Women and Development in Africa
Rebekah Doochin
Fall 2020



About this Project

For my final project, I have used graphs and visualizations generated with data on women in African countries, and recreated depictions of those trends in photographs. I think that using human models and real world objects to exhibit some of the high rates of violence and oppression against women will be a more impactful representation than computer generated graphs alone. When doing research, for this class and others, I have found that it is easy to accidentally dehumanize a situation by focusing on the numbers. My goal with this project is to draw connections between the data and the humans which that data is telling us about.



Important Information

Throughout this project you will notice that for many of the datasets, I have filtered out data from before the year 2010. This is because I believe that looking at the last 10 years of data will provide the most accurate insight into what the quality of life is like for women in African countries today, while still giving me enough information to work with.

Below I have begun to import CSV and Excel files containing data pertaining to relevant indicators of quality of life for women in African countries, as well as data pertaining to GDP and economic indicators.

In [141]:
%matplotlib inline
import pandas as pd
import numpy as np  
from IPython.display import display_html
import matplotlib.pyplot as plt
from IPython.display import Image

# The ratio of females to males who are literate. The ages of those surveyed
# range from 15-24. 
literacy_rates = pd.read_csv('../data/ratio_of_young_literate_females_to_males_percent_ages_15_24.csv')

# Women who believe husbands are justified in beating their wife for any of the following 5 reasons:
# arguing with him, burning the food, negelcting the children, going out without telling him, or 
# refusing him sex. 
justified_violence = pd.read_csv('../data/sg_vaw_reas_zs.csv')

# Proportion of women who have beeen subject to physical or sexual violence in the last 12 months. The percent
# of women ages 15-49. 
violence_df = pd.read_csv('../data/sg_vaw_1549_zs.csv')

# OECD data on GDP of countries around the world. Oil rents only goes up to the year 2017.
gdp_df = pd.read_csv('../data/OECDdata.csv')

# World Bank data on the proportion of seats held by women in national parliaments (%).
parliament_df = pd.read_excel(r'../data/women_in_parliament_data.xls')

# Recent data on oil rents as a percent of GDP. 
oil_gdp_recent_df = pd.read_csv('../data/recent_oil_rents_of_gdp.csv')

Below is a list of all the countries in Africa. This will be helpful when filtering data out of larger world datasets. I also begin to filter out data from before the year 2010 for the dataframe containing information on violence against women.

In [142]:
African_countries = ['Algeria', 'Angola', 'Benin', 'Botswana', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cameroon',
                     'Central African Republic (CAR)', 'Chad', 'Comoros', 'Congo, Democratic Republic of the', 
                     'Congo, Republic of the', "Cote d'Ivoire", 'Djibouti', 'Egypt', 'Equatorial Guinea', 'Eritrea', 
                     'Eswatini (formerly Swaziland)', 'Ethiopia', 'Gabon', 'Gambia', 'Ghana', 'Guinea', 'Guinea-Bissau', 
                     'Kenya', 'Lesotho', 'Liberia', 'Libya', 'Madagascar', 'Malawi', 'Mali', 'Mauritania', 'Mauritius', 
                     'Morocco', 'Mozambique', 'Namibia', 'Niger', 'Nigeria', 'Rwanda', 'Sao Tome and Principe', 
                     'Senegal', 'Seychelles', 'Sierra Leone', 'Somalia', 'South Africa', 'South Sudan', 'Sudan', 
                     'Tanzania', 'Togo', 'Tunisia', 'Uganda', 'Zambia', 'Zimbabwe']

# Using the last 10 years of data. 
for i in range(2000,2010):
    del violence_df[str(i)]

# Using only the countries in Africa.
africa_violence = violence_df.loc[violence_df["country"].isin(African_countries)]

Rates of Violence

Only 29 out of the 54 countries in Africa had data on the percentage of women between the ages of 15 and 49 who had been subject to physical or sexual violence for a 12 month period.

In [143]:
len(africa_violence)
Out[143]:
29

It is important to note that there is a large amount of data missing from various countries and years. This is not because there was no violence, but rather because data relating to this subject matter can be difficult to gather. It is possible that countries with some of the highest rates of violence against women also had high rates of violence overall, which would have made data pertaining to women especially difficult to gather. It is also possible that women in these studies have underreported the rates or severity of violence for fear of social repercussions.

Displayed below is a scatter plot representing the rates with which women had beeen subject to violence in various countries in Africa. The information comes from a dataset containing the proportions of women who have beeen subject to physical or sexual violence in the last 12 months. These values represent the percentage out of total women in the country between the ages of 15 and 49 who were surveyed.

In [144]:
# The code below "melts" the data so that all of the years are in a single column and are easier
# to work with. 
melted_violence_df = pd.melt(africa_violence,
                        ["country"],
                        var_name = "year",
                        value_name = "rate of violence")

# Drawing a graph and storing the result.
scatter = melted_violence_df.plot.scatter(x='year', y='rate of violence', figsize=(5,8))
scatter.set_ylabel("Violence Rates (% of total women in country ages 15-49)")
scatter.set_xlabel("Year")
Out[144]:
Text(0.5, 0, 'Year')

The scatter plot shown above is helpful for seeing general trends in rates of women subject to violence on the African continent. To get more detailed information, we'll look at the bar graph shown below which displays the proportion of women subject to violence by country and year.

In [145]:
violence_present_df = melted_violence_df.dropna()
violence_present_df = violence_present_df.astype({"year":str, "country":str})
violence_present_df["Country"] = violence_present_df["country"] + " (" + violence_present_df["year"] + ")"
violence_present_df.plot(x='Country', y='rate of violence', kind='bar', figsize=(15,5),
                        title="Percent of Women Subject to Violence in the last 12 months")
Out[145]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbed9113e50>
In [146]:
# Getting data for Burundi.
violence_present_df.loc[violence_present_df['country']=='Burundi'] 

# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Burundi 2017', ' '
sizes = [27.9, 72.1]
explode = (0.1, 0)  # only "explode" the 1st slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

The pie chart above represents the 27.9% of women were subject to violence in Burundi in 2017. This is the highest rate of violence for the most recent year we see in our data. To get a better idea of how accurate this data is, we will explore a dataset containing information on the proportion of women who believe their husbands are justified in beating them for one of five reasons: arguing with him, burning the food, negelcting the children, going out without telling him, refusing him sex.

The image displayed below is a recreation of the pie chart using two of my friends who were willing to help with the project. I asked them to wear hoodies and gloves to conceal their identities for the photos; in doing this I was trying not to make the stories of other women my own. My friends and I do not look like the women who were surveyed to get this data, nor have we had similar experiences to theirs. It was important that this recreation promoted learning from their stories while not attempting to tell them myself.

In [147]:
Image(filename = "../images/IMG_1106 2.JPG", width = 300, height = 300)
Out[147]:

Looking at "Justified" Violence

In [148]:
# Tidying the dataframe of justified violence.
justified_violence = pd.melt(justified_violence,
                        ["country"],
                        var_name = "year",
                        value_name = "rate of violence")
justified_violence = justified_violence.dropna()
In [149]:
# Scatter plot. Drawing a graph and storing the result.
scatter = justified_violence.plot.scatter(x='year', y='rate of violence', figsize=(15,8), ylim=(0.0, 100.0))
scatter.set_ylabel("Percent of women who believe husbands are justifed in beating them")
scatter.set_xlabel("year")
Out[149]:
Text(0.5, 0, 'year')

In the scatter plot displayed above is the percent of women who believe their husbands are justified in beating them for one of the five reasons previously mentioned. I thought it was significant to include data from before 2010 to demonstrate how this trend has not changed over time, as some other trends have. I predict that countries in which this indicator was high in 1999, are countries that still have high rates of both this indicator and overall violence against women.

In [150]:
# Bar graph. 
# Using the last 10 years of data. 
for index, row in justified_violence.iterrows():
    if int(row['year']) < 2015:
        justified_violence = justified_violence.drop(index)
        
justified_violence = justified_violence.astype({"year":str, "country":str})
justified_violence["Country + Year"] = justified_violence["country"] + " (" + justified_violence["year"] + ")"
justified_violence.plot(x='Country + Year', y='rate of violence', kind='bar', figsize=(15,5), ylim=(0.0, 100.0),
                        title="Percent of women who believe husbands are justifed in beating them")
Out[150]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbecf6e8e50>

The chart displayed above is some of the most recent data broken down by country and year. This is so that we can get a better idea of African women's feelings toward domestic violence in the present day. Shown below is a recreation of this graph using burnt pasta; one of the five reasons which a woman might think her husband is justified in beating her for (buring food).

In [151]:
Image(filename = "../images/IMG_1150 2.JPG", width = 850, height = 400)
Out[151]:
In [152]:
# Getting data for Burundi
justified_violence.loc[justified_violence['country']=='Burundi']

# Pie chart, where the slices will be ordered and plotted counter-clockwise:
labels = 'Burundi 2017', ' '
sizes = [61.8, 38.2]
explode = (0.1, 0)  # only "explode" the 1st slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

The pie chart above represents the 61.8% of women in Burundi in 2017 who believed their husbands were justified in beating them for either arguing with him, burning the food, negelcting the children, going out without telling him, or refusing him sex. This is significantly different from the 27.9% of women who had reported experiencing physical or sexual violence in that same year. This leads me to believe that there was underreporting in the dataset with information on the percentage of women who had been subject to physical or sexual violence within the last 12 months. Below is the recreation.

In [153]:
Image(filename = "../images/IMG_1110 2.JPG", width = 300, height = 300)
Out[153]:

Literacy Rates

Displayed in the bar graphs below are the 10 lowest and highest ratios of literate females to literate males. In the study which the data was derived from, literacy was defined as being able to read and write short statements about one's everyday life. The ages of those surveyed range from 15-24, and the data used was from the years 2000-2010. In the second graph, displaying the countries with the 10 highest proportions, you will notice that the ratio exceeds 1.0. This could be due to there being a greater number of women in the population than men.
In [154]:
literacy_rates = pd.melt(literacy_rates,
                        ["country"],
                        var_name = "year",
                        value_name = "literacy rate")
literacy_rates = literacy_rates.dropna()

# Using the most recent 10 years of data. 
for index, row in literacy_rates.iterrows():
    if int(row['year']) < 2000:
        literacy_rates = literacy_rates.drop(index)
for index, row in literacy_rates.iterrows():
    if row['country'] not in African_countries:
        literacy_rates = literacy_rates.drop(index)
In [155]:
# Gets the 10 lowest literacy rates from the year 2000-2010.

lowest_lit_df = literacy_rates.nsmallest(10, ['literacy rate'])
lowest_lit_df = lowest_lit_df.astype({"year":str, "country":str})
lowest_lit_df["Country + Year"] = lowest_lit_df["country"] + " (" + lowest_lit_df["year"] + ")"
lowest_lit_df.plot(x='Country + Year', y='literacy rate', kind='bar', figsize=(15,5),ylim=(0.0, 1.5), 
                   title="10 Lowest Literacy Ratios")
Out[155]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbecf5a51f0>
In [156]:
Image(filename = "../images/IMG_1159 2.JPG", width = 850, height = 400)
Out[156]:
In [157]:
# Gets the 10 highest literacy rates from the year 2000-2010.

highest_lit_df = literacy_rates.nlargest(10, ['literacy rate'])
highest_lit_df = highest_lit_df.astype({"year":str, "country":str})
highest_lit_df["Country + Year"] = highest_lit_df["country"] + " (" + highest_lit_df["year"] + ")"
highest_lit_df.plot(x='Country + Year', y='literacy rate', kind='bar', figsize=(15,5),ylim=(0.0, 1.5),
                   title="10 Highest Literacy Ratios")
Out[157]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbecf57ef10>
In [158]:
Image(filename = "../images/IMG_1162 2.JPG", width = 850, height = 400)
Out[158]:

Relationship between oil rents and rates of violence


In this section I am going to use a dataset with information on the proportion of women who have been subject to violence and merge this information into a dataframe that contains values representing the percentage of a country's GDP that comes from oil rents. I predict that there will be a correlation between countries with large amounts of oil revenues and rates of women subject to violence.

In [159]:
# Tidying up the GDP data so that it is easier to look at various indicators. Examples of indicators in this dataset
# include oil rents, total natural resource rents, and mineral rents as a percent of each country's total GDP. 
gdp_df = gdp_df.drop(["COUNTRY", "INDICATOR", "TABLE", "YEAR", "Flag Codes", "Flags"], axis = 1)

# Using only the countries in Africa.
africa_gdp_df = gdp_df.loc[gdp_df["Country"].isin(African_countries)]

# Isolating the oil rents indicator so that it is the only one in the current dataframe.
oil_rents_df = africa_gdp_df.loc[gdp_df["Indicator"]== "Oil rents (% of GDP)"]

# Changing column names to accurately reflect its values and dropping unnecessary columns. 
oil_rents_df = oil_rents_df.rename(columns={"Value":"Oil rents (% of GDP)"})
oil_rents_df = oil_rents_df.drop(["Indicator", "Table name"], axis = 1)
In [160]:
# Tidying the rates of violence data so that we can more easily merge it. 
melted_violence_df = melted_violence_df.rename(columns={"country":"Country", "year":"Year"})

# Combining datasets to see if there is a realtionship between oil rents and rates of violence.
oil_to_violence_df = melted_violence_df.merge(oil_rents_df, on=["Country", "Year"], how="inner")

# dropping rows with nans
oil_to_violence_df = oil_to_violence_df.dropna()

# Preparing to display the two dataframes side by side.
df1 = oil_to_violence_df.sort_values(ascending=False, by=["Oil rents (% of GDP)"])
df2 = oil_rents_df.sort_values(ascending=False, by=["Oil rents (% of GDP)"])[:13]

df1_styler = df1.style.set_table_attributes("style='display:inline'").set_caption('Oil and Violence')
df2_styler = df2.style.set_table_attributes("style='display:inline'").set_caption('Oil Rents')
In [161]:
display_html(df1_styler._repr_html_() + " .   .   .   ." + df2_styler._repr_html_(), raw=True)
Oil and Violence
Country Year rate of violence Oil rents (% of GDP)
64 Egypt 2014 14.000000 6.754000
62 Cameroon 2014 32.700000 4.369000
0 Burkina Faso 2010 9.300000 0.000000
33 Comoros 2012 4.900000 0.000000
51 Gambia 2013 7.300000 0.000000
68 Kenya 2014 25.500000 0.000000
86 Rwanda 2015 20.700000 0.000000
89 Zimbabwe 2015 19.900000 0.000000
95 Ethiopia 2016 19.800000 0.000000
99 Malawi 2016 24.300000 0.000000
103 Uganda 2016 29.900000 0.000000
106 Burundi 2017 27.900000 0.000000
117 Senegal 2017 12.200000 0.000000
. . . .
Oil Rents
Country Year Oil rents (% of GDP)
80098 Sudan 2008 22.913000
80101 Sudan 2011 18.380000
80097 Sudan 2007 18.247000
80095 Sudan 2005 17.405000
80096 Sudan 2006 16.988000
80100 Sudan 2010 13.610000
80094 Sudan 2004 13.448000
120531 Egypt 2008 12.469000
120528 Egypt 2005 12.193000
120529 Egypt 2006 12.106000
80099 Sudan 2009 11.028000
120530 Egypt 2007 10.958000
80090 Sudan 2000 10.138000

LEFT: Countries that contained both significant rates of physical or sexual violence against women and oil rents as a percent of its GDP, sorted in decreasing order of the greatest percent of oil rents. If a country did not have data on both the rates of violence and oil rents, then it was not included in this table.

RIGHT: Countries in the dataframe with oil rents as a percent of GDP, sorted in decreasing order of the greatest percent of oil rents.

CONCLUSIONS: Looking at the Oil and Violence table on the left, it is clear that there is not significant data to prove a correlation between rates of violence against women and countries with a high percent of their GDP coming from oil revenues (only two countries in the merged dataframe!).

Relationship between the percent of women in parliament and other indicators


In [162]:
# Adding the proper column headings to the women in parliament dataset. 
col_list = parliament_df.iloc[2]
for i in range(len(col_list)):
    if(type(col_list[i]) == np.float64):
        col_list[i] = int(col_list[i])
parliament_df.columns = col_list

# Dropping the first three rows which don't contain data. 
parliament_df = parliament_df.iloc[3:]
parliament_df = parliament_df.reset_index()
parliament_df = parliament_df.drop(['Country Code', 'Indicator Name', 'Indicator Code','index'],axis = 1)

# Using only the countries in Africa.
parliament_df = parliament_df.loc[parliament_df["Country Name"].isin(African_countries)]
<ipython-input-162-640ce99800d3>:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  col_list[i] = int(col_list[i])
In [163]:
# Melting the dataset to have a years column.
melted_parliament_df = pd.melt(parliament_df,
                        ["Country Name"],
                        var_name = "Year",
                        value_name = "Women in Parliament (%)")

par_by_country_df = melted_parliament_df.groupby('Country Name')['Women in Parliament (%)']
In [165]:
# Dropping rows without values. 
parliament_present_df = melted_parliament_df.dropna()

# Using the last 10 years of data. 
for index, row in parliament_present_df.iterrows():
    if row['Year'] < 2010:
        parliament_present_df = parliament_present_df.drop(index)
    
parliament_present_df = parliament_present_df.astype({"Year":str, "Country Name":str})
parliament_present_df["Country"] = parliament_present_df["Country Name"] + " (" + parliament_present_df["Year"] + ")"
parliament_present_df.plot(x='Country Name', y='Women in Parliament (%)', kind='scatter', figsize=(15,5),rot=90)
Out[165]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbecf4fd490>

The graph above represents the proportion of seats held by women in national parliaments. Each country has a point representing the data available for every year since 2010. Many countries are missing data for various years or only have data for a few of the most recent years.

In [166]:
par_vi_df = violence_present_df.merge(parliament_present_df, on=["Country"], how="inner")
par_vi_df.plot.scatter(x="Women in Parliament (%)", y="rate of violence", alpha=.5)
Out[166]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbecf4240a0>

The scatter plot above shows the percent of a country's parliament which is female plotted against the rate of violence in that country, for all the countries we have data for since 2010. The rate of violence indicator comes from the dataset of women who had been subject to physical or sexual violence within the last 12 months. My expectation before seeing the results of this graph were that there would be an inverse relationship between the two variables; meaning that as the rate of women in parliament increased, the rate of violence in the country would decrease. Although that may be the trend on the right half of the graph (after about 30% of women in parliament), there appears to be no relationship between the two variables up until that point. Had we used a larger dataset, we might have been able to see more of a correlation.

Below is the recreation of the scatter plot. The graph was painted on a door found in the ally on the side of my house. My initial hypothesis of violence against women decreasing as the rate of women in parliament increased relected the possibility of more women in positions of power due to increased respect for women in a country. Initially I thought the door was a nice place for this graph, as it could represent a metaphorical door leading to greater opportunities for women.

In [167]:
Image(filename = "../images/IMG_1130 3.JPG", width = 500, height = 300)
Out[167]:

Sources

  • https://www.gapminder.org/data/
    Ratio of females to males who are between the ages of 15 and 24 and literate. Literacy is being defined as being able to understand, read, and write simple sentences on their everday life. Used in this notebook as 'literacy_rates'.

  • https://stats.oecd.org/Index.aspx?DataSetCode=AEO11_COUNTRYNOTES_TAB2_EN#
    World data on various indicators including gross domestic product in U.S. dollars and oil rents (% of GDP)

  • https://data.unicef.org/topic/child-protection/violence/attitudes-and-social-norms-on-violence/
    women who believe husbands are justified in beating their wife for any of the following 5 reasons: arguing with him, burning the food, negelcting the children, going out without telling him, or refusing him sex.

  • https://apps.who.int/gho/data/view.main.IPVv
    Proportion of women who have beeen subject to physical or sexual violence in the last 12 months. The percent of women ages 15-49.

  • https://data.worldbank.org/indicator/SG.GEN.PARL.ZS
    World Bank data on Proportion of seats held by women in national parliaments (%). 1997-2020.